Efficient Concurrent Programming with Python's ThreadPoolExecutor and as_completed

Published: August 28, 2023 | Author: Vispi Nevile Karkaria

How ThreadPoolExecutor and as_completed Can Help

ThreadPoolExecutor and as_completed in Python's concurrent.futures library offer an efficient way to manage concurrent tasks. They enable:

Python Code Example

To demonstrate, here is a simple Python code snippet using ThreadPoolExecutor and as_completed:


        from concurrent.futures import ThreadPoolExecutor, as_completed

        def worker_function(x):
            return x * x

        # Create a ThreadPoolExecutor
        with ThreadPoolExecutor() as executor:
            # Submit tasks to the executor
            futures = {executor.submit(worker_function, i): i for i in range(5)}
            # Process results as they complete
            for future in as_completed(futures):
                print(future.result())
        

The code above creates a ThreadPoolExecutor and submits five tasks to it. The tasks are processed concurrently, and their results are printed as they become available.

Use Cases

Data Scraping

ThreadPoolExecutor can significantly expedite data scraping tasks by running multiple scrapers in parallel, thus reducing the time it takes to retrieve data from multiple sources.

Python Code Example


        from concurrent.futures import ThreadPoolExecutor
        import requests

        def fetch_data(url):
            return requests.get(url).text

        urls = ['https://example.com/page1', 'https://example.com/page2']

        # Create a ThreadPoolExecutor
        with ThreadPoolExecutor() as executor:
            # Fetch data from multiple URLs in parallel
            results = list(executor.map(fetch_data, urls))
        

This example demonstrates how to perform web scraping on two URLs concurrently. The `executor.map()` function is a convenient way to execute the `fetch_data` function on each URL in the `urls` list. The results are returned in a list.

Image Processing

Tasks like resizing, filtering, and transformation can be parallelized using ThreadPoolExecutor to improve the overall processing speed. This is particularly beneficial when you have to process a large number of images.

Python Code Example


        from PIL import Image
        from concurrent.futures import ThreadPoolExecutor

        def resize_image(image_path, output_path):
            img = Image.open(image_path)
            img = img.resize((300, 300))
            img.save(output_path)

        image_paths = ['image1.jpg', 'image2.jpg', 'image3.jpg']
        output_paths = ['image1_resized.jpg', 'image2_resized.jpg', 'image3_resized.jpg']

        with ThreadPoolExecutor() as executor:
            executor.map(resize_image, image_paths, output_paths)
        

This Python snippet uses the Pillow library to resize images concurrently. ThreadPoolExecutor's `map()` function handles the parallel processing, significantly speeding up the resizing operation.

API Calls

Making API calls sequentially can be a bottleneck in your application. By using ThreadPoolExecutor, you can make multiple API requests in parallel, thus saving a considerable amount of time.

Python Code Example


        import requests
        from concurrent.futures import ThreadPoolExecutor

        def fetch_data_from_api(api_url):
            return requests.get(api_url).json()

        api_urls = ['https://api.example.com/data1', 'https://api.example.com/data2']

        with ThreadPoolExecutor() as executor:
            results = list(executor.map(fetch_data_from_api, api_urls))
        

The above code demonstrates how to make API requests concurrently using ThreadPoolExecutor. The `executor.map()` function concurrently fetches data from the list of API URLs and stores the results in a list.

Tips and Caveats

Conclusion

ThreadPoolExecutor and as_completed offer an efficient and convenient way to handle concurrent programming tasks in Python. By understanding their capabilities and limitations, you can significantly improve the performance and efficiency of various applications, be it data scraping, image processing, or API interactions.

I encourage you to experiment with these tools and find the right balance between concurrency and parallelism to meet your specific needs.

Further Reading

0 Likes

Leave a Comment